A Two-step Approach to Video Retrieval based on ASR transcriptions

نویسندگان

  • Ken Schmidt
  • Thomas Korner
  • Stephan Heinich
  • Thomas Wilhelm-Stein
چکیده

In this paper, we describe our experiments for the Rich Speech Retrieval Task at the MediaEval Benchmark Initiative 2011. We start with a brief overview on the used framework and its structure. Our experiments indicate that a two-step retrieval approach and applying a spell checker can improve the quality of retrieval results in the given scenario. Finally, we discuss other techniques that may further improve the quality of the results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recurrent Neural Network-Based Phoneme Sequence Estimation Using Multiple ASR Systems' Outputs for Spoken Term Detection

This paper describes a novel correct phoneme sequence estimation method that uses a recurrent neural network (RNN)-based framework for spoken term detection (STD). In an automatic speech recognition (ASR)-based STD framework, ASR performance (word or subword error rate) affects STD performance. Therefore, it is important to reduce ASR errors to obtain good STD results. In this study, we use an ...

متن کامل

Video Shot Classification Using Lexical Context

Associating concepts to video segments is essential for content-based video retrieval. We present here a semantic classifier working from text transcriptions coming from automatic speech recognition (ASR). The system is based on a Bayesian classifier, it is fully linked with a knowledge base which contains an ontology and named entities from several domains. The system is trained from a set of ...

متن کامل

Building an ASR System for a Low-research Language Through the Adaptation of a High-resource Language ASR System: Preliminary Results

For many languages in the world, not enough (annotated) speech data is available to train an ASR system. We here propose a new three-step method to build an ASR system for such a low-resource language, and test four measures to improve the system’s success. In the first step, we build a phone recognition system on a high-resource language. In the second step, missing low-resource language acous...

متن کامل

Genre tagging of videos based on information retrieval and semantic similarity using WordNet

In this paper we propose a new approach for the genre tagging task of videos, using only their ASR transcripts and associated metadata. This new approach is based on calculating the semantic similarity between the nouns detected in the video transcripts and a bag of nouns generated from WordNet, for each category proposed to classify the videos. Specifically, we have used the Lin measure based ...

متن کامل

Combining Word and Phonetic-Code Representations for Spoken Document Retrieval

The traditional approach for spoken document retrieval (SDR) uses an automatic speech recognizer (ASR) in combination with a word-based information retrieval method. This approach has only showed limited accuracy, partially because ASR systems tend to produce transcriptions of spontaneous speech with significant word error rate. In order to overcome such limitation we propose a method which use...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011